Toward perfect reads

نویسندگان

  • Antoine Limasset
  • Jean-Francois Flot
  • Pierre Peterlongo
چکیده

We propose a new method to correct short reads using de Bruijn graphs, and implement it as a tool called Bcool. As a first step, Bcool constructs a corrected compacted de Bruijn graph from the reads. This graph is then used as a reference and the reads are corrected according to their mapping on the graph. We show that this approach yields a better correction than kmer-spectrum techniques, while being scalable, making it possible to apply it to human-size genomic datasets and beyond. The implementation is open source and available at github.com/Malfoy/BCOOL

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sensitivity of Perfect and Stone-Wales Defective BNNTs Toward NO Molecule: A DFT/M06-2X Approach

The monitoring and controlling of environmental pollutions are very important in biological and industrial processes, and a great interest is growing with the development of suitable gas–sensitive materials and hazardous chemical removal devices. In this work, the highly parameterized, empirical exchange–correlation functional M06–2X were employed to investigate the electronic sensitivity of pe...

متن کامل

Haplotype Inference from Single Short Sequence Reads Using a Population Genealogical History Model

High-throughput sequencing is currently a major transforming technology in biology. In this paper, we study a population genomics problem motivated by the newly available short reads data from high-throughput sequencing. In this problem, we are given short reads collected from individuals in a population. The objective is to infer haplotypes with the given reads. We first formulate the computat...

متن کامل

Evaluation of window cohabitation of DNA sequencing errors and lowest PHRED quality values.

When analyzing sequencing reads, it is important to distinguish between putative correct and wrong bases. An open question is how a PHRED quality value is capable of identifying the miscalled bases and if there is a quality cutoff that allows mapping of most errors. Considering the fact that a low quality value does not necessarily indicate a miscalled position, we decided to investigate if win...

متن کامل

ALLPATHS: de novo assembly of whole-genome shotgun microreads.

New DNA sequencing technologies deliver data at dramatically lower costs but demand new analytical methods to take full advantage of the very short reads that they produce. We provide an initial, theoretical solution to the challenge of de novo assembly from whole-genome shotgun "microreads." For 11 genomes of sizes up to 39 Mb, we generated high-quality assemblies from 80x coverage by paired 3...

متن کامل

On using Longer RNA-seq Reads to Improve Transcript Prediction Accuracy

Over the past decade, sequencing read length has increased from tens to hundreds and then to thousands of bases. Current cDNA synthesis methods prevent RNA-seq reads from being long enough to entirely capture all the RNA transcripts, but long reads can still provide connectivity information on chains of multiple exons that are included in transcripts. We demonstrate that exploiting full connect...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1711.03336  شماره 

صفحات  -

تاریخ انتشار 2017